Overview

Dataset statistics

Number of variables11
Number of observations935
Missing cells272
Missing cells (%)2.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory80.5 KiB
Average record size in memory88.1 B

Variable types

Numeric9
Categorical2

Warnings

IQ is highly correlated with educHigh correlation
educ is highly correlated with IQHigh correlation
meduc is highly correlated with feducHigh correlation
feduc is highly correlated with meducHigh correlation
IQ is highly correlated with educHigh correlation
educ is highly correlated with IQHigh correlation
meduc is highly correlated with feducHigh correlation
feduc is highly correlated with meducHigh correlation
age is highly correlated with experHigh correlation
IQ is highly correlated with educ and 1 other fieldsHigh correlation
educ is highly correlated with IQ and 1 other fieldsHigh correlation
tenure is highly correlated with experHigh correlation
meduc is highly correlated with feducHigh correlation
feduc is highly correlated with meducHigh correlation
black is highly correlated with IQHigh correlation
exper is highly correlated with age and 2 other fieldsHigh correlation
meduc has 78 (8.3%) missing values Missing
feduc has 194 (20.7%) missing values Missing
tenure has 30 (3.2%) zeros Zeros

Reproduction

Analysis started2022-01-19 23:12:01.571208
Analysis finished2022-01-19 23:12:14.763046
Duration13.19 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

wage
Real number (ℝ≥0)

Distinct449
Distinct (%)48.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean957945.4545
Minimum115000
Maximum3078000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2022-01-19T16:12:14.874780image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum115000
5-th percentile437900
Q1669000
median905000
Q31160000
95-th percentile1695500
Maximum3078000
Range2963000
Interquartile range (IQR)491000

Descriptive statistics

Standard deviation404360.8225
Coefficient of variation (CV)0.4221125749
Kurtosis2.717581799
Mean957945.4545
Median Absolute Deviation (MAD)249000
Skewness1.201186794
Sum895679000
Variance1.635076748 × 1011
MonotonicityNot monotonic
2022-01-19T16:12:15.016152image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000031
 
3.3%
125000017
 
1.8%
80000015
 
1.6%
50000013
 
1.4%
96200013
 
1.4%
144200012
 
1.3%
60000011
 
1.2%
90000011
 
1.2%
75000011
 
1.2%
120000010
 
1.1%
Other values (439)791
84.6%
ValueCountFrequency (%)
1150001
0.1%
2000001
0.1%
2330001
0.1%
2600001
0.1%
2650001
0.1%
2890001
0.1%
3000001
0.1%
3100001
0.1%
3180001
0.1%
3250002
0.2%
ValueCountFrequency (%)
30780002
0.2%
27710001
 
0.1%
26680001
 
0.1%
25000002
0.2%
24040002
0.2%
23100001
 
0.1%
23080001
 
0.1%
21620003
0.3%
21370004
0.4%
20990001
 
0.1%

hours
Real number (ℝ≥0)

Distinct37
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.92941176
Minimum20
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2022-01-19T16:12:15.155009image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile38
Q140
median40
Q348
95-th percentile60
Maximum80
Range60
Interquartile range (IQR)8

Descriptive statistics

Standard deviation7.224255863
Coefficient of variation (CV)0.1644514591
Kurtosis4.18664052
Mean43.92941176
Median Absolute Deviation (MAD)0
Skewness1.596175194
Sum41074
Variance52.18987278
MonotonicityNot monotonic
2022-01-19T16:12:15.280847image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
40497
53.2%
4597
 
10.4%
5091
 
9.7%
5541
 
4.4%
4835
 
3.7%
6032
 
3.4%
4419
 
2.0%
3815
 
1.6%
4314
 
1.5%
3511
 
1.2%
Other values (27)83
 
8.9%
ValueCountFrequency (%)
201
 
0.1%
231
 
0.1%
241
 
0.1%
251
 
0.1%
272
 
0.2%
307
0.7%
325
0.5%
341
 
0.1%
3511
1.2%
364
 
0.4%
ValueCountFrequency (%)
804
 
0.4%
752
 
0.2%
707
 
0.7%
657
 
0.7%
641
 
0.1%
611
 
0.1%
6032
3.4%
591
 
0.1%
582
 
0.2%
563
 
0.3%

IQ
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct80
Distinct (%)8.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean101.2823529
Minimum50
Maximum145
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2022-01-19T16:12:15.414962image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum50
5-th percentile74
Q192
median102
Q3112
95-th percentile124.3
Maximum145
Range95
Interquartile range (IQR)20

Descriptive statistics

Standard deviation15.05263637
Coefficient of variation (CV)0.148620524
Kurtosis-0.01664359899
Mean101.2823529
Median Absolute Deviation (MAD)10
Skewness-0.3409718687
Sum94699
Variance226.5818617
MonotonicityNot monotonic
2022-01-19T16:12:15.549567image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9635
 
3.7%
10435
 
3.7%
10933
 
3.5%
9830
 
3.2%
9728
 
3.0%
11028
 
3.0%
10527
 
2.9%
10626
 
2.8%
10123
 
2.5%
10822
 
2.4%
Other values (70)648
69.3%
ValueCountFrequency (%)
501
0.1%
541
0.1%
551
0.1%
591
0.1%
601
0.1%
611
0.1%
622
0.2%
631
0.1%
642
0.2%
651
0.1%
ValueCountFrequency (%)
1451
 
0.1%
1371
 
0.1%
1344
0.4%
1325
0.5%
1314
0.4%
1303
0.3%
1294
0.4%
1284
0.4%
1276
0.6%
1264
0.4%

educ
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.4684492
Minimum9
Maximum18
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2022-01-19T16:12:15.669461image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum9
5-th percentile11
Q112
median12
Q316
95-th percentile18
Maximum18
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.196653882
Coefficient of variation (CV)0.1630962741
Kurtosis-0.7348626865
Mean13.4684492
Median Absolute Deviation (MAD)1
Skewness0.5486765038
Sum12593
Variance4.825288278
MonotonicityNot monotonic
2022-01-19T16:12:15.772431image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
12393
42.0%
16150
 
16.0%
1385
 
9.1%
1477
 
8.2%
1857
 
6.1%
1545
 
4.8%
1143
 
4.6%
1740
 
4.3%
1035
 
3.7%
910
 
1.1%
ValueCountFrequency (%)
910
 
1.1%
1035
 
3.7%
1143
 
4.6%
12393
42.0%
1385
 
9.1%
1477
 
8.2%
1545
 
4.8%
16150
 
16.0%
1740
 
4.3%
1857
 
6.1%
ValueCountFrequency (%)
1857
 
6.1%
1740
 
4.3%
16150
 
16.0%
1545
 
4.8%
1477
 
8.2%
1385
 
9.1%
12393
42.0%
1143
 
4.6%
1035
 
3.7%
910
 
1.1%

exper
Real number (ℝ≥0)

HIGH CORRELATION

Distinct22
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.56363636
Minimum1
Maximum23
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2022-01-19T16:12:15.900260image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q18
median11
Q315
95-th percentile19
Maximum23
Range22
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.374586384
Coefficient of variation (CV)0.3783054263
Kurtosis-0.5637954492
Mean11.56363636
Median Absolute Deviation (MAD)3
Skewness0.07780088502
Sum10812
Variance19.13700603
MonotonicityNot monotonic
2022-01-19T16:12:15.998992image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
1189
 
9.5%
982
 
8.8%
872
 
7.7%
1072
 
7.7%
1668
 
7.3%
1265
 
7.0%
1362
 
6.6%
1560
 
6.4%
754
 
5.8%
1454
 
5.8%
Other values (12)257
27.5%
ValueCountFrequency (%)
112
 
1.3%
31
 
0.1%
429
 
3.1%
530
 
3.2%
648
5.1%
754
5.8%
872
7.7%
982
8.8%
1072
7.7%
1189
9.5%
ValueCountFrequency (%)
232
 
0.2%
223
 
0.3%
2112
 
1.3%
2014
 
1.5%
1923
 
2.5%
1830
3.2%
1753
5.7%
1668
7.3%
1560
6.4%
1454
5.8%

tenure
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct23
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.234224599
Minimum0
Maximum22
Zeros30
Zeros (%)3.2%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2022-01-19T16:12:16.103123image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q13
median7
Q311
95-th percentile16
Maximum22
Range22
Interquartile range (IQR)8

Descriptive statistics

Standard deviation5.075205802
Coefficient of variation (CV)0.701554912
Kurtosis-0.7985985761
Mean7.234224599
Median Absolute Deviation (MAD)4
Skewness0.4325322037
Sum6764
Variance25.75771393
MonotonicityNot monotonic
2022-01-19T16:12:16.203090image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
1104
11.1%
293
 
9.9%
372
 
7.7%
971
 
7.6%
568
 
7.3%
459
 
6.3%
1058
 
6.2%
756
 
6.0%
1253
 
5.7%
848
 
5.1%
Other values (13)253
27.1%
ValueCountFrequency (%)
030
 
3.2%
1104
11.1%
293
9.9%
372
7.7%
459
6.3%
568
7.3%
619
 
2.0%
756
6.0%
848
5.1%
971
7.6%
ValueCountFrequency (%)
221
 
0.1%
212
 
0.2%
204
 
0.4%
196
 
0.6%
1814
 
1.5%
179
 
1.0%
1622
2.4%
1538
4.1%
1428
3.0%
1340
4.3%

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct11
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.0802139
Minimum28
Maximum38
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2022-01-19T16:12:16.288935image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum28
5-th percentile29
Q130
median33
Q336
95-th percentile38
Maximum38
Range10
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.107803254
Coefficient of variation (CV)0.09394749573
Kurtosis-1.257093959
Mean33.0802139
Median Absolute Deviation (MAD)3
Skewness0.1187358742
Sum30930
Variance9.658441068
MonotonicityNot monotonic
2022-01-19T16:12:16.383793image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
30120
12.8%
3299
10.6%
3899
10.6%
3198
10.5%
3695
10.2%
2986
9.2%
3782
8.8%
3381
8.7%
3469
7.4%
3561
6.5%
ValueCountFrequency (%)
2845
 
4.8%
2986
9.2%
30120
12.8%
3198
10.5%
3299
10.6%
3381
8.7%
3469
7.4%
3561
6.5%
3695
10.2%
3782
8.8%
ValueCountFrequency (%)
3899
10.6%
3782
8.8%
3695
10.2%
3561
6.5%
3469
7.4%
3381
8.7%
3299
10.6%
3198
10.5%
30120
12.8%
2986
9.2%

married
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.4 KiB
1
835 
0
100 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters935
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1835
89.3%
0100
 
10.7%

Length

2022-01-19T16:12:16.569875image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-01-19T16:12:16.632320image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
1835
89.3%
0100
 
10.7%

Most occurring characters

ValueCountFrequency (%)
1835
89.3%
0100
 
10.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number935
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1835
89.3%
0100
 
10.7%

Most occurring scripts

ValueCountFrequency (%)
Common935
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1835
89.3%
0100
 
10.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII935
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1835
89.3%
0100
 
10.7%

black
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.4 KiB
0
815 
1
120 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters935
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0815
87.2%
1120
 
12.8%

Length

2022-01-19T16:12:16.804119image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-01-19T16:12:16.851013image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0815
87.2%
1120
 
12.8%

Most occurring characters

ValueCountFrequency (%)
0815
87.2%
1120
 
12.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number935
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0815
87.2%
1120
 
12.8%

Most occurring scripts

ValueCountFrequency (%)
Common935
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0815
87.2%
1120
 
12.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII935
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0815
87.2%
1120
 
12.8%

meduc
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct19
Distinct (%)2.2%
Missing78
Missing (%)8.3%
Infinite0
Infinite (%)0.0%
Mean10.68261377
Minimum0
Maximum18
Zeros3
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2022-01-19T16:12:16.914813image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile6
Q18
median12
Q312
95-th percentile16
Maximum18
Range18
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.849756291
Coefficient of variation (CV)0.2667658265
Kurtosis0.9440547428
Mean10.68261377
Median Absolute Deviation (MAD)1
Skewness-0.4977403036
Sum9155
Variance8.121110917
MonotonicityNot monotonic
2022-01-19T16:12:17.023247image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
12357
38.2%
8129
 
13.8%
1065
 
7.0%
1156
 
6.0%
947
 
5.0%
1642
 
4.5%
731
 
3.3%
630
 
3.2%
1428
 
3.0%
1321
 
2.2%
Other values (9)51
 
5.5%
(Missing)78
 
8.3%
ValueCountFrequency (%)
03
 
0.3%
11
 
0.1%
25
 
0.5%
39
 
1.0%
46
 
0.6%
58
 
0.9%
630
 
3.2%
731
 
3.3%
8129
13.8%
947
 
5.0%
ValueCountFrequency (%)
185
 
0.5%
177
 
0.7%
1642
 
4.5%
157
 
0.7%
1428
 
3.0%
1321
 
2.2%
12357
38.2%
1156
 
6.0%
1065
 
7.0%
947
 
5.0%

feduc
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct18
Distinct (%)2.4%
Missing194
Missing (%)20.7%
Infinite0
Infinite (%)0.0%
Mean10.21727395
Minimum0
Maximum18
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2022-01-19T16:12:17.116849image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5
Q18
median10
Q312
95-th percentile16
Maximum18
Range18
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.300699945
Coefficient of variation (CV)0.323050939
Kurtosis-0.0283119831
Mean10.21727395
Median Absolute Deviation (MAD)2
Skewness-0.04346897555
Sum7571
Variance10.89462013
MonotonicityNot monotonic
2022-01-19T16:12:17.373533image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
12216
23.1%
8122
13.0%
1077
 
8.2%
641
 
4.4%
1140
 
4.3%
939
 
4.2%
1638
 
4.1%
737
 
4.0%
1428
 
3.0%
522
 
2.4%
Other values (8)81
 
8.7%
(Missing)194
20.7%
ValueCountFrequency (%)
01
 
0.1%
28
 
0.9%
39
 
1.0%
417
 
1.8%
522
 
2.4%
641
 
4.4%
737
 
4.0%
8122
13.0%
939
 
4.2%
1077
8.2%
ValueCountFrequency (%)
1816
 
1.7%
176
 
0.6%
1638
 
4.1%
157
 
0.7%
1428
 
3.0%
1317
 
1.8%
12216
23.1%
1140
 
4.3%
1077
 
8.2%
939
 
4.2%

Interactions

2022-01-19T16:12:04.103599image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:04.255541image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:04.383941image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:04.508605image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:04.635540image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:04.768190image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:04.892138image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:05.017172image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:05.132494image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:05.262387image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:05.390069image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:05.507304image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:05.616653image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:05.819940image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:05.958569image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:06.081506image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:06.206401image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:06.337260image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:06.460218image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:06.569992image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:06.679210image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:06.805026image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:06.921748image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:07.035660image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:07.148608image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:07.257433image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:07.376310image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:07.491997image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:07.612644image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:07.731653image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:07.847331image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:07.963040image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:08.075105image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:08.188320image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:08.305390image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:08.426298image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:08.541993image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:08.647939image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:08.758174image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:08.880302image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:08.994741image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:09.191211image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:09.303175image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:09.428044image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:09.545988image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:09.658686image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:09.775374image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:09.892977image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:10.006669image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:10.123358image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:10.235333image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:10.335432image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:10.470815image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:10.590493image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:10.702197image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:10.825196image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:10.944876image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:11.062497image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:11.163579image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:11.289357image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:11.407043image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:11.525922image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:11.653039image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:11.771720image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:11.905575image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:12.037488image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:12.163117image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:12.286058image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:12.404567image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:12.528434image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:12.656092image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:12.787178image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:12.914043image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:13.038877image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:13.161476image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:13.391124image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:13.515790image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:13.650597image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:13.777285image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:13.896838image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-19T16:12:14.022890image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-01-19T16:12:17.487227image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-01-19T16:12:17.652008image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-01-19T16:12:17.813005image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-01-19T16:12:17.977762image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-01-19T16:12:18.113395image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-01-19T16:12:14.252850image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-01-19T16:12:14.459561image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-01-19T16:12:14.598220image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-01-19T16:12:14.678009image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

wagehoursIQeducexpertenureagemarriedblackmeducfeduc
0769000.040931211231108.08.0
1808000.050119181116371014.014.0
2825000.04010814119331014.014.0
3650000.0409612137321012.012.0
4562000.040741114534106.011.0
51400000.0401161614235118.0NaN
6600000.040911013030008.08.0
71081000.0401141881438108.0NaN
81154000.04511115131361014.05.0
91000000.04095121616361012.011.0

Last rows

wagehoursIQeducexpertenureagemarriedblackmeducfeduc
925645000.045931211335107.08.0
926788000.0401001115632119.0NaN
927644000.04210112115331012.0NaN
928477000.045100129331107.07.0
929664000.0608216109341116.016.0
930520000.040791661301111.0NaN
9311202000.0401021310331108.06.0
932538000.0457712121028117.0NaN
933873000.0441091212122810NaN11.0
9341000000.0401071217183510NaNNaN